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Abstract. The importance of manifolds and Riemannian geometry is spreading to applied fields 
in which the need to model non-linear structure has spurred wide-spread interest in geometry. The 
transfer of interest has created demand for methods for computing classical constructs of geometry 
on manifolds occurring in practical applications. This paper develops initial value problems for the 
computation of the differential of the exponential map and Jacobi fields on parametrically and im- 
plicitly represented manifolds. It is shown how the solution to these problems allow for determining 
sectional curvatures and provides upper bounds for injectivity radii. In addition, when combined 
with the second derivative of the exponential map, the initial value problems allow for numerical 
computation of Principal Geodesic Analysis, a non-linear version of the Principal Component Anal- 
ysis procedure for estimating variability in datasets. The paper develops algorithms for computing 
Principal Geodesic Analysis without the tangent space approximation previously used and, thereby, 
provides an example of how the constructs of theoretical geometry apply to solving problems in 
statistics. By testing the algorithms on synthetic datasets, we show how curvature affects the result 
of PGA. 
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1. Introduction. Manifolds, sets locally modeled by Euclidean spaces, have 
a long and intriguing history in mathematics, and topological, differential geometric, 
and Riemannian geometric properties of manifolds have been studied extensively with 
results extending far beyond the fields of manifolds themselves. The introduction of 
high-performance computing in applied fields have widened the use of manifolds, and 
Riemannian manifolds, in particular, are now used for modeling a range of prob- 
lems possessing non-linear structure. Applications include shape modeling (complex 
projective shape spaces [23] and medial representations of surfaces [UED]), imaging 
(tensor manifolds in diffusion tensor imaging [IIJ [12l [31] and image segmentation and 
registration [HE^Di anc ^ severa l other fields (forestry [TS], human motion modeling 

Era Sana). 

To fully utilize the power of manifolds in modeling, it is essential to develop 
fast and robust algorithms for computing various manifold constructions. Comput- 
ing intrinsic distances, Jacobi fields, curvatures, and injectivity radii pose important 
problems |18j as well as solving optimization problems posed on manifolds or in man- 
ifold tangent spaces and defining and computing manifold generalizations of common 
Euclidean space statistics. The papers 16, 25[ [2H [Ml 133 address first-order man- 
ifold problems, and certain second-order problems have been considered but mainly 
on limited classes of manifolds [5] . Generalizing linear statistics has been the focus of 
the papers EH M H3 EH UM • 

In this article, we study the second-order problems arising from variations of the 
initial velocity of geodesies. This will allow us to compute structures fundamental 
to geometry and to numerically solve certain optimization problems posed in tangent 
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spaces of manifolds. The developed methods apply to manifolds represented both 
parametrically and implicitly without preconditions such as knowledge of explicit 
formulas for geodesies. Hence, in addition to being interesting from a geometrical and 
computational point of view, the algorithms will be useful for applications in several 
of the mentioned areas. 

To exemplify this, we consider the problem of capturing the variation of a set 
of manifold valued data. The well-known Principal Component Analysis procedure 
(PCA) has been generalized to manifold valued data with the introduction of Principal 
Geodesic Analysis (PGA, [TO])- The construction is the source of continuing interest 
from both application oriented authors and the statistical community, most recently 
with the development of Geodesic PCA (GPCA, [IS]). Both PGA and GPCA have 
been used successfully for a number of applications [UJ HU HE] HU [35] [3j|] ■ 

Until now, there were no algorithm for numerically computing PGA for general 
manifolds. Linear approximations have been used instead except for special classes 
of manifolds where geodesies have explicit analytical formulas [SUITS]. Because PGA 
is posed as an optimization problem in the tangent space of the manifold, the tools 
developed here apply to computing it without linearizing the manifold. We will show 
how those tools allow us to compute exact PGA for a wide range of manifolds under 
some assumptions on the optimization problems. 

1.1. Related Work. A vast body of mathematical literature describes mani- 
folds and Riemannian structures, and [7J [55] provide excellent introductions to the 
field. Different aspects of numerical computation on implicitly defined manifolds are 
covered in [441 1341 133] . Generalized inverses are important in the study of implicitly 
defined manifolds, and we will use a result of Decell [5]. 

An important starting point for our work is the paper of Dedieu and Nowicki 
[6J where the authors develop an initial value problem (IVP) for the computation of 
geodesies on implicitly defined manifolds. This result, together with the IVP defining 
geodesies in the parametrized case [7] , constitutes the basis for the IVPs developed in 
the following sections. A similar approach is taken in |43| for computing Jacobi fields 
on the infinite dimensional manifold of diffcomorphisms. Several authors have studied 
the solution of the exponential map inverse problem, often called the logarithm map: 
in [35] [2T] [315] , different schemes are used to evolve an initial path towards a geodesic, 
and [55] [55J [35] use shooting methods. We build upon these works by assuming the 
logarithm problem is solved for the manifolds in question. 

An optimization problem can be posed on a manifold in the sense that the domain 
of the cost function is restricted to the manifold and the sought for optima must reside 
on the manifold. Such problems are extensively covered in the literature (e.g. [2S1IT5] ). 
The optimization problems we will solve involves the manifold geometry in the cost 
functions, but the domains will be the linear tangent spaces or subsets thereof with 
simple geometry. Therefore, the complexity will lie in the cost functions and not 
the optimization domains, and we will not need to use the optimization algorithms 
dealing with manifold domains. 

The manifold generalization of linear PCA, PGA, was first introduced in [5] , but 
it was formulated in the form most widely used in [10 . It has subsequently been used 
for several applications. To mention a few, the authors in [10J [TTJ study variations 
of medial atoms, |4TJ uses a variation of PGA for facial classification, [35] presents 
examples on motion capture data, and [3 9) applies PGA to vertebrae outlines. In 
addition, finding principal modes in tangent spaces, the procedure labeled linearized 
PGA in this paper, has been used for analyzing spine deformation modes and deformi- 
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ties in [31 [2] • The algorithm presented in 10J for computing PGA with tangent space 
linearization is most widely used. In contrast to this, [35] computes PGA as defined 
in [5] without approximations, but only for a specific manifold, the Lie group SO (3). 
Our recent paper |38j uses the methods presented here to experimentally assess the 
effect of tangent space linearization, and we show that the algorithms work on high 
dimensional manifolds modelling real-life data. 

A recent wave of interest in manifold valued statistics from the statistical com- 
munity has lead to the development of Geodesic PCA (GPCA, [TSJ QU H7J). GPCA 
is in many respects close to PGA but optimizes for the placement of the center point 
and minimizes projection residuals along geodesies instead of maximizing variance 
in geodesic subspaces. GPCA uses no linear approximation, but it is currently only 
computed on spaces where explicit formulas for geodesies exist and on quotients of 
such spaces. 

1.2. Content and Outline. The paper will present the following main contri- 
butions: 

(1) We construct initial value problems allowing the computation of the differ- 
ential of the exponential map and Jacobi fields, and second derivative of the 
exponential map on both parametric and implicitly represented manifolds of 
finite dimension. 

(2) We show how the tools developed allow for numerical computation of the 
sectional curvature and injectivity radius bounds for the manifolds. 

(3) We present an algorithm allowing the computation of PGA without linearizing 
the problem to the tangent space. 

(4) We present examples showing the differences between exact PGA and the 
linearized PGA previously used, and how the differences depend on the cur- 
vature of the manifold. 

Due to the generality of the setup, the algorithm in |(3)| will work for many of the appli- 
cations using PGA as defined in [10] . In particular, it will apply to those of the above 
mentioned examples using finite dimensional manifolds with available parametrization 
or implicit representation. We comment more on the classes of manifolds covered in 
section [2.1 1 In addition, we will need some assumptions on the manifold and dataset 
ensuring the optimization problems are well-behaved so that true global optima are 
found. 

The importance of curvature computations is noted in |18j , which lists the ability 
to compute sectional curvature as a high importance open problem. The result of |(2)| 
can be seen as a partial solution to this problem; we are indeed able to numerically 
compute the sectional curvature, although for the anomalous shape-spaces [23j used 
in |18j no parametrization or implicit representation is directly available, and hence 
the methods presented here do not apply. 

In the experiments |(4)| we evaluate how the difference between the methods vary 
as we increase the curvature of the manifold. This experiment, which to the best of 
our knowledge has not been made before, is made possible by the generality of the 
algorithms of |(1)| which frees us from previous restrictions to specific manifolds such 
as SO(3) [35] or anomalous shape-spaces [T5] . 

The paper will start by a brief discussion of the required notation and geometry in 
section[2j We will touch upon the definition of PGA and how curvature and injectivity 
radius bounds relate to Jacobi fields. The reader already familiar with Riemannian 
geometry may wish to skip parts of this section. In section [3j we present IVPs for the 
differential of the exponential map and Jacobi fields and for the second derivative of 
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the exponential map. The actual derivations are lengthy and are, therefore, covered 
in the appendices. Following this, in section |4j we develop the exact PGA algorithm. 
We end the paper with experiments in section [5] and concluding remarks. 

2. Geometry and Notation. We give a brief discussion of some aspects of 
differential and Ricmannian geometry and, at the same time, introduce the notation 
used in the rest of the paper. The reader is referred to [7] for an introduction to 
differential geometry and Riemannian manifolds. 

2.1. Manifolds and Their Representations. We will in the paper work with 
diffcrcntiablc manifolds of finite dimension, and, in the sequel, M will denote such a 
manifold of dimension ij. We will need M to be sufficiently smooth, i.e. of class C k for 
k = 3 or 4 depending on the application. A chart of M is then a map ip € C k (U, M) 
from an open subset U of W to the manifold, and, since a chart provides a coordinate 
representation of a part of the manifold, it is often called a local parametrization. 

Manifolds can be represented without local parameterizations. Let M be a level 
set of a differentiable map F : W n — > W\ If the Jacobian matrix D X F has full 
rank n for all x € M, the level set is said to be regular. In that case, M will be an 
(to — rt)-dimcnsional manifold, and we say that M is implicitly defined. The space K m 
is called the embedding space. Throughout this paper, when dealing with implicitly 
defined manifolds, to and n will denote the dimension of the domain and codomain 
of F, respectively. We then have r\ = to — n for the dimension rj of the manifold, 

In addition to local parametrizations and implicit representations, other ways of 
representing manifolds include discrete triangulations used for surfaces and quotients 
M/G of a larger manifold M by a group G. The latter is for example the case for 
Kendall's shape-spaces Sjj [23j. Kendall's shape-spaces for planar points are actually 
complex projective spaces CP k ~ 2 for which parameterizations are available, and, for 
points in 3-dimensional space and higher, the shape-spaces are anomalous and not 
manifolds. The spaces studied in [TJ] belong to this class. 

Our methods do not apply directly to cases where local parametrizations or im- 
plicit representations are not available. We note, however, that for the quotients used 
in |18) , M is a high-dimensional sphere and much of the optimization is performed on 
M instead of M/G. We are currently investigating how our methods can complement 
this in extending the approach to quotients M/G with M not restricted to being a 
sphere. 

2.2. Curves and Differentiation. We will deal with parametrized entities, 
most notably curves on manifolds, and we use subscripts for the parameter. For 
example, a curve on M dependent on t will be denoted xt- As our curves will normally 
start at t = 0, the starting point of curve x t will be the point Xq. The subscript 
notation should not be confused with differentiation with respect to the parameter 
t. When a local parametrization is available, we will often use it to represent the 
curve, and we will normally not distinguish between the curve and its expression 
x t — [x\, . . . , xj) in parameter space. 

The tangent space of M at a point p is, a vector space of dimension 77, will be 
denoted T p M, and the derivative J^Xt of a curve Xt evaluated at i then belongs to 
T X ,M. We will often write just 4zXf for such vectors, i.e. ^xt\ t -i- In addition, when 
differentiating curves with respect to t, we often use the shorthand x t - With these 
conventions, 4iXt\t= Oi the initial velocity of the curve Xt, will be written Xq. 

The differential of a map / : M — > N will be denoted df and its evaluation at 
p E M will be denoted d p f. When bases for T p M and Tft p \M are specified, or when 
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M and N are Euclidean spaces, we will write Df instead of df. We will encounter 
maps denned on a product of manifolds, e.g. (v, w) > g(v,w) : M x M — > N, for 
which we will need to distinguish differentiation with respect to one of the variables 
only. Letting one of the parameters have a fixed value Wo, the differential of the 
restricted function v i->- g(v,Wo) from M to N evaluated at vq is denoted dj vg Wg ^g- 
Along the same lines, if V is a submanifold of M, the differential of f\v ■ V — > N will 
be denoted d veV f and its evaluation at tj e V will be written d% g v f . 

2.3. Riemannian Manifolds and Geodesies. We will work solely with Rie- 
mannian manifolds, i.e. diffcrcntiable manifolds endowed with a smooth family of 
inner products on their tangent spaces. More precisely, a Riemannian metric on a 
manifold M is a smooth map g which associates to p e M an inner product (•, ■) on 
T p M, and, in a local parametrization, g will be a smooth map to the space of sym- 
metric, positive definite matrices of order 77. The pair (M, g) is then a Riemannian 
manifold. When M is a submanifold of R m , the tangent space T p M of M at a point p 
can be identified with a linear subspace of R m of dimension 77, and the inner product 
(•, ■) will be chosen to be the restriction of the standard inner product of R m . 

The Riemannian metric determines notions such as length of curves, diffcrcnta- 
tion of vector fields, Christoffel symbols, and geodesies. If x t is a curve, the length 
l(xt) is given by the integral J \\it\\dt using the norm || • || on T Xt M induced by the 
metric. Computing directional derivatives of a vector field is done by a connection 
that associates to a pair (X, Y) of vector fields on M a new vector field denoted VyX 
so that (VyJ) (p) will be a directional derivative of X at p in the direction Y(p). 
A special connection, called the Levi-Civita connection, is associated to the Rieman- 
nian metric, and the connection defines the covariant derivative ^V t of a vector field 
Vt along a curve. On implicitly defined manifolds, ^Vt is simply the projection of 
the usual derivative of vector fields onto T Xt M, and, in a local parametrization, the 
covariant derivative of the vector fields (d Xl , . . . , d x ) defines the Christoffel symbols 

of the metric by the relations Vg x .d Xj = J2k=i^ij^x k - The rf functions T^(x) 
satisfy the symmetry relation T^. = r^. 

Geodesic curves, manifold generalizations of straight lines, are characterized by 
having vanishing intrinsic acceleration expressed by the covariant derivative of the 
velocity field, %Xt being zero. Geodesies are locally length minimizing and unique 
in the sense that given a point q and a velocity v € T p M 7 the geodesic passing q 
with velocity v is unique. The map which constructs geodesies given q and v is 
called the exponential map and denoted Exp. Thus, the unique geodesic is the curve 
x t = Exp p tv. 

For points q in a sufficiently small neighborhood of q, the length minimizing curve 
joining q and q is unique as well. Given q and q, the initial direction in which to travel 
geodesically from q in order to reach q is given by the result of the logarithm map 
Log g (g). We get the corresponding geodesic as the curve t n- Exp q (tLog q q), and 
hence Log 9 is the inverse of Exp g . Subsets Exp g _B r (0) of M with B r (0) being a ball 
in T q M and with the radius r > sufficiently small are examples of neighborhoods of 
q in which Log q (q) is defined. Whenever we use the Log-map, we will restrict to such 
neighborhoods without explicitly mentioning it. 

The gradient grad h of a real valued function h : M — > R is also defined using 
the metric: at p <G M, grad p h is the unique vector in T p M which represents d p h in 
the sense that d p h(v) = (grad p h, i>) for all v € T p M . Whenever a basis of T p M is 
specified, or when M is Euclidean, we switch to the usual notation Vft. Similarly, 
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the Hessian of h is denned by the relation Hessian(/i)X = V x grad h for all vector 
fields X. Again, when a basis of T p M is specified, or when M is Euclidean, we use 
the usual notation H(h). 

2.4. Geodesic Systems. When a manifold is represented by a parametrization, 
the value of exponential map can be found as the solution of the IVP 



x 



%3 

x = q, x = v 



(2.1) 



in parameter space at time t = 1. Recall that 77 denotes the dimension of the manifold 
and that a chart if : W 1 — > M is used to connect the parameter space and the manifold. 
This classical characterization of geodesies is not directly usable when the manifold is 
represented implicitly and, therefore, neither parametrization nor Christoffcl symbols 
are directly available. To handle this situation, a first order IVP for the computation 
of the exponential map on implicitly represented manifolds as developed in [6]. Here 
Exp o can be found as the x-part of the solution of the following IVP at time t = 1: 



x t = (l-D Xt F^D Xt F) Pt , 

xo = q, Po = v ■ 



(2.2) 



The map fj, : R m x R m -> E n is defined by (x,p) ^ -(D x F T ) i 'p, and the symbol At 
denotes the generalized inverse of the possibly non-square or singular matrix A [5]. 

2.5. Jacobi Fields and Global Geometry. Studying variations of geodesies 
leads to the notion of Jacobi fields, which encode important geometric information 
such as curvature and injectivity radius. In order to define Jacobi fields, let Xt >a 
be a family of geodesies parametrized by s, i.e. for each s, the curve t 1— > Xt.s is a 
geodesic. When fixing the position t on the curves but varying the parameter s, we 
obtain the vector field 4^Xt t o, and such a vector field is called a Jacobi field along 
the geodesic x t o(^] The Jacobi fields along a given geodesic are uniquely determined 
by the initial conditions Jo and % Jo, the variation of the initial points xo, s and the 
covariant derivative of the field at t = 0, respectively. Define q s = xq_ s , v s = io,s, 
and w — j^v . If -^q = Jo and w = J^Jo then ^Exp^ (tvo) is equal to J t [3 Chap. 
5]. Therefore, in cases when q s is constant and Jo therefore 0, we have the following 
connection between J t and dExp: 

d Vo Exp qo tw = J t ■ (2.3) 
Jacobi fields can equivalently be defined as solutions to the ODE 

D 2 

■jjaJt = -R(xt,Jt)x t (2.4) 

with R denoting the curvature endomorphism [71 Chap. 5]. For parametrized mani- 
folds, the ODE can be written in parameter space and can, in principle, be used for 



Recall that with the notation introduced in section 
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2.2 



4-xtfi equals 4-x t , s \ s =o- 



Fig. 2.1. The sphere S 2 with a Jacobi field along a geodesic connecting the poles. Each pole 
is a conjugate point to the other since the non-zero Jacobi field vanishes. The injectivity radius is 
equal to the length of the geodesic, it. 



numerical computations of Jacobi fields. The expressions are somewhat complicated, 



though, and we will obtain a different IVP by differentiating the system (2.1 1. The 
curvature endomorphism is not easily computed when the manifold is represented 
implicitly, and, therefore, the above ODE is not directly useful in this case. By 



differentiating the system (2.2), we remedy this in the next section. 

Besides allowing us to calculate d„ Expg , Jacobi fields enable us to retrieve var- 
ious geometric information about the manifold. We can for example estimate the 
sectional curvature of the manifold at qq using a Jacobi field Jt as defined above with 
Jo = and Vq, w orthonormal. Performing a Taylor expansion of the length || J t ||, we 
get 



\J t \\ = t--K q0 (a)t 3 + O(t 4 ) 



where K qo {a) is the sectional curvature of the plane span{w ,w} in T qo M [7J Chap. 
5]. For small t, the sectional curvature can then be estimated by 

K qo (a)*^(t-\\J(t)\\) . (2.5) 

Furthermore, if Jt is a non-zero Jacobi field with Jo = along a geodesic Xt and, for 
some t > 0, also Jj = then xj is called a conjugate point to xq. This implies that for 
any r > t, the geodesic xt is not the shortest joining xq and x r Chap. 13]. In this 
way, we get an upper bound on the injectivity radius of M, which, in general terms, 
specifies the minimum length of non-minimizing geodesies. Figure [2~T| illustrates the 
situation on the sphere § 2 . 

2.6. Geodesic Subspaces. Linear subspaces are of great importance when 
studying data in Euclidean spaces; PCA, for example, can be formulated as an opti- 
mization problem on the set of linear subspaces. There is no obvious generalization of 
linear subspaces to manifolds, but, if one accepts the choice of a center point, the no- 
tion of geodesic subspaces becomes useful. A subset F>xp q V of M is called a geodesic 
subspace centered at q if V is a linear subspace of T q M. Ceodesics between q and any 
point in the subspace are contained in the subspace, a fact which, in general, is not 



true for geodesies between arbitrary pairs of points in the subspace. The projection 
of a point x € M onto a geodesic subspace S = Exp q V is defined as 



ir s (x) = a,vguuTL yeS d{x,y) 2 = argmm yeS ||Log y a: 
= Exp g (argmin lueV ||Log Exp xf) . 



(2-6) 



Neither existence or uniqueness of the projection is in general ensured, although, for 
each geodesic subspace S, the set of points for which uniqueness fail has zero measure 
in M 18J. Existence of the projection is ensured if S is compact, which, for example, 
is the case if M is compact and S an embedded submanifold. 

2.7. Principal Geodesic Analysis. Principal Component Analysis (PCA) is 
widely used to model the variability of data in Euclidean spaces. The procedure 
provides linear dimensionality reduction by defining a sequence of linear subspaces 
maximizing the variance of the projection of the data to the subspaces or, equivalently, 
minimizing the reconstruction errors. The feth subspace is spanned by an orthogonal 
basis {v 1 , . . . , v k } of principal components v 1 , . . . , v k , and the iih principal component 
is defined recursively by 



1 I ' \ 




v 1 = argmaxii 

when formulated as to maximize the variance of the projection of the dataset {x\, . . . , xn} 
to the subspaces span {v 1 , . . . , v 1 ^ 1 }. 

PCA is dependent on the vector space structure of the Euclidean space and hence 
cannot be performed on manifold valued datasets. Principal Geodesic Analysis was 
developed to overcome this limitation. PGA finds geodesic subspaces centered a 
point n € M with fi usually being an intrinsic mearj^] of the dataset {x\, . . . ,xn}, 
Xj 6 M. The fcth geodesic subspace Sk of T M M is defined as Exp M (Vfc) with Vk — 
span {v 1 , . . . , v k } being the span of the principal directions v 1 , . . . , v k defined recur- 
sively by 



1 N 

o' = argmax| H | =Mey i — ^2d(n,<K Sv (xj)) 



S v = Exp M (span{y i _ 1 ,u}) . 

The notation V^-^ denotes the orthogonal complement of Vt-i in T^M . The term 
being maximized is the sample variance of the projected data, the expected value 
of the squared distance to /i, and PGA therefore extends PCA by finding geodesic 
subspaces in which variance is maximized. 

Since a method for computing the projection irs k (x) has not been available for 
general manifolds, PGA has traditionally been computed using the orthogonal pro- 
jection in the tangent space of /i to approximate the true projection. With this 



approximation, equation (2.81=1 simplifies to 

N / i-1 



ar s max ii«n=i^ E ( ( L %A'> v ) 2 + E ( L °sn x 3> vi y 

3=1 V 1=1 



2 The notion of intrinsic mean goes back to Frechet |14| and Karcher 21 . As in I10| . we define 
it as argmin MgM 5ZjLi d(ti, %j) 2 - Uniqueness issues is treated in |21| . 



which is equivalent to (2.7), and, therefore, the procedure amounts to performing 
regular PCA on the vectors Log^xj. We will refer to PGA with the approximation 
as linearized PGA, and PGA as defined by (2.8M will be referred to as exact PGA. 



The above and prevalent definition of PGA is developed in [TU], but a slightly 
different definition was introduced in [5]. The latter definition involves only one- 
dimensional subspaces and uses Lie group structure. In |35j . the fact that its has a 
closed form solution on the sphere S 3 when S is a one-dimensional geodesic subspace 
is used to compute exact PGA with the [9] definition by performing a steepest descent 



using the gradient of the cost function equivalent to the cost function of (2.8 k 



Replacing maximization of sample variance by minimization of reconstruction 
error, we obtain another manifold extension of PCA and thus an alternate definition 
of PGA: 



argminii 



N 



d(xj,ws v (xj)f 



(2.8^ 



In contrast to vector space PCA, the two PGA definitions are not equivalent, a fact 
showing that the Euclidean and curved situations differ fundamentally. The latter 
formulation is chosen for Geodesic PCA to avoid instabilities of variance maximiza- 
tion [18] . but the optimization algorithms developed in this paper work for both 
formulations. We will use the variance formulation for the experiments, but we will 



collectively refer to definitions by (2.8 1 



In general, PGA might not be well-defined as the mean might not be unique and 



both existence and uniqueness may fail for the projections (2.6) and the optimization 



problems ( 2.8 1 . The convexity bounds of Karcher 21J ensures uniquesness of the mean 



for sufficiently local data, but setting up sufficient conditions to ensure well-posedness 



of (2.6) and (2.8) is a difficult issue, and here we will just assume well-posedness for 



the given manifold and dataset. 



3. The Differentials. In this section, we aim at developing an initial value 
problem (IVPs) describing the differential of the exponential map and Jacobi fields, 
and, in addition, we will differentiate the IVPs a second time and thereby create the 
tools needed for the PGA algorithms presented in the next section. The basic strategy 
is simple: we differentiate the systems of section [2~4| and use the resulting IVPs. 

It is a well-known fact that IVPs satisfying natural properties are diffcrcntiablc 
with respect to their initial values [16j Chap. 1.14]. The important contribution of this 
section is the explicit expressions for the differentiated systems that allow numerical 
integration and, in particular for the case of implicitly represented manifolds, are 
not straightforward to derive. To the best of our knowledge, no IVP describing the 
differential of the exponential map and Jacobi fields has previously been available in 
the implicit case; the IVP (3.3) remedies this situation. As previously noted, the 



ODE (2.4) describes Jacobi fields in the parameterized case but the expressions in 



parameter space are complicated. Therefore, we derive the IVP (3.2) below, which 
we find simpler to work with for the applications of this paper. 



The presence of the generalized inverse in system (2.2) proves to be the main 



source of complexity for the implicit case. We handle the differentiation of this system 
using the following result of Decell: 



Theorem 3.1 ([5]). Let A s and its generalized inverse A\ be differentiate s- 
dependent matrices. Then -^(A\) = A(A S , j^A s ) where 



A(A,B) = - A^BA^ + (B T (A^) T A^ +A\A^) t B t ) 

(3 1) 

-a^a(b t {a^) t a^ + a\a^) t b t )aa^ . 



We will apply the result with A s — D Xt 3 F with x t , s an s dependent family of 
geodesies and t fixed. To see that D Xt s F* is differentiable with respect to s when Xt, s 
depends smoothly on s, take a frame of the normal space to M in a neighborhood 
of x t . s , and note that D Xt e F* is a composition of a invertible map onto the frame 
depending smoothly on s and the frame itself. 

The remaining computations for deriving the systems are lengthy and notationally 
heavy. At this point, we only state the results and postpone the derivations and the 
proof of the following theorem to Appendix |A} 

Theorem 3.2. Let x t be a geodesic in the C 3 manifold M with x — q and 
xq = v, and let u,w be vectors in T Xo M . Assume Xt is contained in a parametrized 
subset of M . Then the Jacobi field J t along x t with Jo = u and ^ Jo = w can be 
found as the z-part of the solution of the IVP 

ZtJ V U " (3.2) 

2/o \ _ (W 



Zq I \U 

with F£ v the map given in explicit form in Appendix\A\ 

Now, let instead M C 1™ be defined as a regular zero level set of a C 3 map 
F : K' n — > R™. Then the Jacobi field Jt along x± with Jq = u and % Jq = w can be 
found as the z-part of the solution of the IVP 



Vt\ _ p i ( t IVt 



ztl ^ v \ \z t 



2/0 \ = (w 

Zq) \U 



(3.3) 



with Fg V the map given in explicit form in Appendix p| 

The maps F~ v and F* v and consequently the systems (3.2) and (3.3) are linear 
in the initial values (w u) T as expected of systems describing differentials. They are 
non- autonomous due to the dependence on the position on the curve Xt- 

The following corollary allows the computation of the derivative of the exponential 
map: 



Corollary 3.3. With the assumptions of Theorem \3. 6 A let (y t ,zt) satisfy (3.2) 
or (3.3) with IVs (w,0) T . Then d v Exp q w is equal to Z\. 



Proof. Let J t be the Jacobi field along xt with Jq = and %Jo — W. By 



Theorem 13. 2| z\ — J\, which, by (2.3), is equal to d v ¥xp q w. 
□ 

The result enables us to compute the entire differential d„Exp ? by applying the 
corollary to each element of a basis {w 1 , . . . , w v } for T q M . The matrix having the 
results in its columns then equals Z^Exp^. Note that Exp^Log^jy = y implies that 
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dyLog q = (dLog^yExpg) , a fact that allows the corollary to be used for computing 
d y hog q as well. 



We can differentiate the systems (3.2 1 and (3.3) once more if the manifold is 



sufficiently smooth. The main difficulty here is performing the algebra of the already 
complicated expressions for F qv and F* v . For the implicit case, we will need to find 
the second derivative of D Xts F* and hence extend Deceit's result. For simplicity, 
we consider a family of geodesies Xt jS with the start point xq jS constant in s. The 
derivations and the proof are again postponed to Appendix [A) 

Theorem 3.4. Let w € T q M with M of class C A , and let xt :S be a family 
of geodesies with xq_ s — q and v s — io,s- Define u — j^vo, and let V q ^ VOtW _ u — 

^ (dt, s Exp 9 u>) — (Exp 9 « s + no) . Assume Xt jS is contained in a parametrized 

subset of M . Then V q _ Vo _ WtU can be found as the r-part of the solution of the IVP 

/( 'H \ = g p ( , ('ii 



f.i q,v a ,w,u . . . r 



^ ' 1 1 \ ( 3 - 4 ) 



r J V° 

with Gg Vo w u the map given in explicit form in Appendix p| 

Now, let instead M C M. rn be defined as a regular zero level set of a C 4 map 
F : K m — > K™. Then Vq tVOiW u can be found as the r-part of the solution of the IVP 

9t\ =r i ( f ( <h 



r J V° 

with G l „ VaWU the map given in explicit form in Appendix 



We note that solutions to (3.4) and (3.5) depend linearly on u even though the 
systems are not linear. 



:arh 



3.1. Numerical Considerations. The geodesic systems (2.1) and (2.2) can in 
both the parametrized and implicit case be expressed in Hamiltonian forms. In [B], 
the authors use this property along with symplectic numerical integrators to ensure 
the computed curves will be close to actual geodesies. This is possible since the 
Hamiltonian encodes the Riemannian metric. Derivatives of Hamiltonian systems can 
be expressed in Hamiltonian form, and, therefore, the systems of Theorem 13.21 and 
Theorem 13.41 have Hamiltonian formulations. Using symplectic integrators, we can 
preserve the Hamiltonians, but the usefulness of this is limited since the Hamiltonians 
do not have directly interpretable forms in contrast to the case of geodesic systems. 

Along the same lines, we would like to use the preservation of quadratic forms 
for symplectic integrators |15) to preserve quadratic properties of the differential of 
the exponential map, e.g. the Gauss Lemma [7j- At this point, we have, however, not 
been able to establish this for the implicit case. 

4. Exact PGA. We will provide algorithms for iteratively solving the optimiza- 



tion problems (2.8) and hence compute exact PGA as defined in [10] without the 
traditional linear approximation. The algorithms will work for parametrized and im- 
plicitly represented manifolds under the following assumptions. First, we require that 



the PGA problem is well-defined as discussed in section 2.7 Second, the logarithm 
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map must be computable. As noted in the introduction, good implementations exist 
for both parametric and implicitly represented manifolds. Third, we will need to as- 



sume non-existence of local optima for the cost functions of (2.6 1 and (2.8) to ensure 
the optimization algorithms find the true global solutions. Forth, a local convexity 
assumption of the residual function, which is satisfied for local data, will be needed. 
We note that, if the third assumption is left out, it is indeed possible to find examples 
of manifolds and datasets where the algorithms will get stuck in local optima. 



Solving the optimization problems (2.8) requires the ability to compute the pro- 
jection operator its- We start by finding expressions for the gradients of the cost 
functions of the optimization problems using the IVPs derived in section [3j and, 
thereafter, we present the actual algorithms for solving the problems. The overall 
approach of solving (2.8) is similar to the approach of [35] • Our solution differs in 



that we are able to compute and its differential without restricting to the manifold 
SO(3) and in that we optimize (2.8) instead of the simpler]^] cost function of [3]. 



The optimization problems (2.6) and (2.8) are posed in the tangent space of the 



manifold at the sample mean and the unit sphere of that tangent space, respectively. 
These domains have relatively simple geometry, and, therefore, the complexity of 
the problems is contained in the cost functions. Because of this, we will not need 
algorithms for optimizing problems with domains of complicated geometry. 

As we are able to compute the gradient of the cost function of the problems, 
we can use approaches such as steepest descent. Yet, because both problems are 
quadratic, optimization algorithms such as Gauss-Newton or Levenberg-Marquardt 
are also applicable if the Jacobians are present. For simplicity, we compute gradi- 
ents and present steepest descent algorithms, but it is straightforward to compute 
Jacobians instead and use more advanced optimization algorithms. 

4.1. The Projection. We consider the projection tts(x) of a point x £ M on 
a geodesic subspace S. Assume S is centered at /j e M, let V be a /c-dimensional 
subspace of T^M such that S = Exp^F, and define a residual function R x ,^ ■ V — s- R 
by w i— > ||Log Exp w x\\ 2 measuring distances between x and points in S. Computing 



TTs(x) by solving (2.6) is then equivalent to finding w £ V minimizing R x ,^- To find 



the gradient of R x ,fj,, choose an orthonormal basis for V and extend it to a basis for 
T^M. Furthermore, let wq £ V and choose an orthonormal basis for the tangent 
space Tgxp «, M. Karcher showed in [21] that the gradient grad y ||Log y a;|| 2 equals 
— 2Log y a;, and, using this, we get the gradient of the residual function as 

T 



YjwEV p 
v w n x,p 



Exp mo" 



(4.1) 



with (D. U ; Exp„)i ) ... ! fc denoting the first k columns of D WQ Exp^ when expressed using 
the chosen bases. 



4.2. The Gradient of the Projection. In order to optimize (2.8), we will need 
to compute gradients of the form 

gmdlf^ d(y,Tr Sv (x)) 2 (4.2) 
with V Vo — span {v 1 , . . . , v k , vq}, S v = Exp(T^, ), and y £ M being either the intrinsic 



mean [i for ( pE8R or x for ( j2j8TO )ij This will involve the gradient of irs v (x) with 



3 Simple r in the sense that projections in [9] involve only one-dimensional subspaces. The cost 
function of l |2.8| uses i-dimensional subspaces for i = 1, . . . , r). 

4 Since v in l |2.8| is restricted to the unit sphere, we will not need the gradient in the direc- 
tion of vq, and, therefore, we find the gradient in the subspace instead of in the larger space 
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respect to v. To derive this, we extend the domain of residual function R x ,n defined 
in the previous subsection from V to T^M. We will choose bases for T^M and V Vo , 
and we let H(R X ^) denote the Hessian of R x ,^ and H(R x<tl \v V0 ) denote the Hessian 
of R x ,fj, restricted to V Vo with respect to the bases. Using this notation, we get the 
following result: 

Theorem 4.1. Let {v 1 , . . . , v k } be a basis for a sub space V C T^M. For each 
v G V , let V v be the subspace span {V, w}, and let S v — Exp„V^ be the correspond- 
ing geodesic subspace. Fix vq G V and define Wq = hog fi Trs v (x) for an x G M. 
Suppose the matrix H V[) (R x ^\y vo ) has full rank fc + 1. Extend the orthonormal basis 
{v 1 , . . . , v k , Vq/\\vq\\} for V Vg to an orthonormal basis for T^M . Then 

Dv v °irs v (%) = -(D Wo Exp )v Xt ^ VOi s [V Wo v ° R x 



(4.3) 

(D Wo 'Exp f j,)E x ^,v ,Sv ■ 
The coordinates of the vector v XtfjttVOt s vo in the basis for V Vo are contained in the 
(k + l)st column of the matrix H VQ {R x ,fi\v VQ ) _1 > the scalar Wq +1 is the (k + l)st 
coordinate of wq in the basis, and E XtfifV0} s v is the matrix 



Hw {Rx,fj,\v VQ ) B WO:Vo 

with B WgyVg the last n — (k + 1) columns of the matrix (H Wo (R x .^) (V vo)) T and 
I v -(k+i) the identity matrix. 

The proof of the theorem is presented in Appendix [EJ The assumption that the 
Hessian of the restricted residual Rx,ii\v v must have full rank is equivalent to the 
residual R x „ having only non-degenerate critical points when restricted to V VQ . It is 
shown in [21 that R x _^ is convex at points sufficiently close to x and the assumption 



is therefore satisfied in such cases. In order to compute the right hand side of (4.3), it 
is necessary to compute parts of the Hessian of the non-restricted residual R x ^. The 
expression for computing H Vo (R Xttl ) is given in Appendix |B] 
Because d(y,ns v {x)) 2 = ||Log 7rs u {x)\\ 2 , we have 

V^%,^W) 2 = 2((^ o(a) LogJ(< G ^7r S) ,(a ; ))) (Lo gl/ 7r s . (a:)) , (4.4) 



which, combined with (4.3), gives (4.2) 



4.3. Exact PGA Algorithm. The expressions for the gradients of the cost 



functions enable us to iteratively solve the optimization problems ( 2.6 ) and ( ]2.8[ ) under 
the mentioned assumptions. We let fi be the intrinsic mean of a dataset {xi, . . . , Xn} 
of points in M. The actual algorithms listed below are essentially steepest descent 
methods. 

Algorithm [l] for computing ns(x) updates w G V instead of the actual point y G S 
that we are interested in. The vector w is related to y by y = Exp^w. 
For solving (|2.8|), we use that 

1 N \ 1 N _L 




span {v l , . . . , v k } 
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Algorithm 1 Calculate tts(x) 



Require: x € M, S = Exp^y geodesic subspace. 

w <= orthogonal projection of Log^x onto V {initial guess} 
repeat 

y 4= Exp M u> {vector to point} 
g <= -2(D^Expp)T k Log y x {gradient} 
w <= w {previous w} 
w <= w — g {update w} 
until \\w — w\\ is sufficiently small. 



Since v in (2.8) is required to be on the unit sphere, the optimization will take place 



on a manifold, and a natural approach to compute iteration updates will use the 
exponential map. Yet, because of the symmetric geometry of the sphere, we approx- 
imate this using the simpler method of adding the gradient to the previous guess 
and normalizing. When computing the (k + l)st principal direction, we choose the 
initial guess as the first regular PCA vector of the data projected to in T^M. The 
algorithm for solving (2.8* I is listed in Algorithm [2j but by exchanging fj, with Xj in 
the gradient computations and updating by subtracting the gradient, the algorithm 



will solve (2.8 1=1=) instead. See Figure 4.1 for an illustration of an iteration of the 
algorithm. 




Fig. 4.1. An iteration of Algorithm^ The figure shows data points x\ and X2 (red points) with 
projections (blue points) to the geodesic subspace S (green line). The vector v defining S is updated 
to the new guess by adding the gradient g. 



5. Experiments. We will perform experiments exemplifying the differences be- 
tween exact PGA and linearized PGA with synthetic data projected onto low dimen- 
sional manifolds on which it is possible to visually identify the differences between 
the methods. We vary the curvature of the manifolds in order to show how curvature 
affects the differences, and we compare the curvature approximation (2.5) and injec- 
tivity radius bound with the true values. For a comparison between the methods on 
high dimensional manifolds modelling real- life data, we refer the reader to [38] . In 
that paper, we compute and compare exact and linearized PGA on a 50 dimensional 
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Algorithm 2 Calculate the (fc + l)st principal direction of ( 2.8|> 1 



Require: (i,Xi, . . . ,£Bjv € M, {u 1 , . . . , orthogonal basis for Vk C T^M. 
w <= first PC A vector of {xA projected first to T^M 
2; and then to V^r 



using Log and then to {initial guess} 



repeat 



9j <= VjJ eV " d(n,TTSv(xj)) 2 {for each j using (4.4)} 



.9 jf Ej=i ffj {gradient using ( gj) )} 
■5 -4= f {previous w} 
u •<= u + g {update v} 
v <f= f/ 1 1 v|| {normalize} 



until \\v — ti|| is sufficiently small. 



manifold containing outlines of human vertebrae captured with lateral X-rays and 
on a 23 dimensional manifold containing human pose data acquired with tracking 
software. 

The PGA algorithm is implemented in Matlab using Runge-Kutta ODE solvers. 
For the logarithm map, we use the shooting algorithm developed in [39]. All tolerances 
used for the integration and logarithm calculations are set at or lower than an order 
of magnitude of the precision used for the displayed results. 

5.1. Synthetic Low-dimensional Data. We consider first surfaces embedded 
in M 3 and defined by the equation 

S c = {(x 1 ,x 2 , x 3 )\cxj + x\ + xl = 1} 

for different values of the scalar c. For c > 0, S c is an ellipsoid and it is equal to 
S 2 in the case c = 1. The surface 5o is a cylinder and, for c < 0, S c is hyperboloid. 
Consider the point p = (0, 0, f ) and note that p £ S c for all c. The curvature of S c at 
p is equal to c. Note in particular that for the cylinder case the curvature is zero; the 
cylinder locally has the geometry of the plane ]R 2 even though it informally seems to 
curve. 

We evenly distribute 20 points along two straight lines through the origin of 
the tangent space T p S c , project the points from T p S c to the surface S c , and perform 



linearized and exact PGA using the variance formulation (2.8*). Figure 5.1 illustrates 
the situation in T p S-\ and on S—\ embedded in M 3 , respectively. 

Since linearized PCA amounts to Euclidean PCA in T P S C . the first principal 
direction found using linearized PGA divides the angle between the lines for all c. In 
contrast to this, the variance and the first principal direction found using exact PGA 
are dependent on c. Table [BTT] shows the angle between the principal directions found 
using the two methods, the variances and variance differences for different values of 
c. 



c: 


1 


0.5 





-0.5 


-1 


-1.5 


-2 


-3 


-4 


-5 


angle (°): 


0.0 


0.1 


0.0 


22.3 


29.2 


31.5 


32.6 


33.8 


34.2 


34.5 


linearized var.: 


0.899 


0.785 


0.601 


0.504 


0.459 


0.435 


0.423 


0.413 


0.413 


0.417 


exact var.: 


0.899 


0.785 


0.601 


0.525 


0.517 


0.512 


0.510 


0.508 


0.507 


0.506 


difference: 


0.000 


0.000 


0.000 


0.212 


0.058 


0.077 


0.087 


0.095 


0.094 


0.089 


difference (%): 


0.0 


0.0 


0.0 


4.2 


12.5 


17.6 


20.6 


23.0 


22.7 


21.4 



Table 5.1 

Differences between methods for different values of c. 
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(a) T p 5_i with sampled points and first prin- (b) 5_i with projected points and first princi- 
cipal components (blue exact PGA, green lin- pal components (blue exact PGA l |2.8[ l, green lin- 
earized PGA). earized PGA). 



Fig. 5.1. 



Let us give a brief explanation of the result. The symmetry of the sphere and 
the dataset cause the effect of curvature to even out in the spherical case Si- The 
cylinder Sq has local geometry equal to R 2 which causes the equality between the 
methods in the c = case. The hyperboloids with c < 0, which can be constructed 
by revolving a hyperbola around its semi-minor axis, are non-symmetric causing an 
increase in variance as the first principal direction approaches the hyperbolic axis. 
The effect increases with the curvature causing the first principal direction to align 
with the hyperbolic axis for large negative values of c. We see that, for all negative 
values of c, exact PGA is able to capture more variance in the subspace spanned by 
the first principal direction than linearized PGA. 

Using (2.5), we can approximate the sectional curvature K p of S c at p. The 
approximation is dependent on the value of the positive scalar t with increasing pre- 
cision as t decreases to zero. Table 15.21 shows the result of the sectional curvature 
approximation for two values of t compared to the real curvature. 



c: 






1 





-1 


-2 


-3 


K p 






1 





-1 


-2 


-3 


K p 


est. 


t = 0.01: 


1.000 


0.000 


-1.000 


-2.000 


-3.000 


K p 


est. 


t = 0.1: 


1.000 


0.000 


-1.001 


-2.002 


-3.005 



Table 5.2 



Sectional curvature at p for different values of c. 



Now let 
geodesic x t - 
see that || J„\ 



Jt be the Jacobi field w ith Jp 
,0) T 



Ex Pp i(0, 



1. 



Figure 



5.2 



§J = (1,0, 0) T along the 



and 

shows II Ji|| for different values of c 



We 



for the spherical case £q showing that aq is a conjugate point and 
hence giv ing the upper bound it on the injectivity radius. The situation is illustrated 
in Figure 2.1 The local geometric equivalence between the cylinder So and ]R 2 causes 
the straight line for c = 0. For all c < 1, the injectivity radius of S c is tt, but for 
c < 1, the point x n not a conjugate poinlJ^J By looking at ||Jt||, we are only able to 
detect conjugate points and hence, with this experiment, we only get the bound on 
the injectivity radius for c > 1. For c > 1 the injectivity radius decreases below 1 as 
seen in the case S2 with || Jf|| = for t « 



5 For c < 1, x n is a cut point [71 Chap. 13]. 
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Fig. 5.2. || J t || for c = 2, 1,0, -1 when Jo = 0, £ Jo = (1,0, 0) T , and x t = Exp p i(0, 1,0) T . 



To investigate the difference with more than one principal direction, we consider 
a four dimensional manifold embedded in M 5 and defined by 

M 4 = {(xi,x 2 , x 3 , i 4 , x 5 )\xl — 2x\ + xl — 2x 4 + x 5 = 1} . 

We make the situation more realistic than in the previous experiment by sampling 
32 random points in the tangent space T V M.±, p = (0,0,0,0,1). Since T p M 4 is an 
affinc subspace of M 5 orthogonal to the x$ axis, we can identify it with K 4 by the 
map (xi, X2, X3, £4) i-> (x%,X2,X3,X4, 1). We use this identification when sampling by 
defining a normal distribution in R 4 , sampling the 32 points from the distribution, and 
mapping the results to T p M 4 . The covariance is set to £ = diag(2, 1, 2/3, 1/3) to get 
non-spherical distribution and to increase the probability of data spreading over high- 



curvature parts of the manifold. Table 5.3 lists the variances and variance differences 



for the four principal directions for both methods along with angular differences. The 
lower variance for exact PGA compared to the linearized method for the 2nd principal 
direction is due to the greedy definition of PGA; when maximizing variance for the 
2nd principal direction, we keep the first principal direction fixed. Hence we may 
get lower variance than what is obtainable if we were to maximize for both principal 
directions together. 



Princ. comp.: 


1 


2 


3 


4 


angle (°): 


10.1 


10.6 


12.0 


12.2 


linearized var.: 


1.58 


3.86 


4.13 


4.35 


exact var.: 


1.93 


3.85 


4.24 


4.35 


difference: 


0.35 


-0.01 


0.11 


0.00 


difference (%): 


21.9 


-0.3 


2.6 


0.0 



Table 5.3 

Differences between the methods on M4. The variances of the data projected to the subspaces 
spanned by the first k principal directions and the percentage and angular differences are shown for 
k = 1,...,4. 



We clearly see angular differences between the principal directions. In addition, 
there is significant difference in accumulated variance in the first and third principal 
direction. We note that the percentage difference is calculated from what corresponds 
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to the accumulated spectrum. The percentage difference of the increase between the 
second and third principal direction, corresponding to the squared length of the third 
eigenvalue in regular PCA, is greater. 

6. Conclusion. We have developed initial value problems allowing the compu- 
tation of several important geometric structures on both parametrized and implicitly 
represented manifolds. We show how the constructed IVPs allow for numerical com- 
putation of injectivity radius bounds and sectional curvatures, which partially solves 
an open problem stated in [18 . Furthermore, the IVPs make possible computation 
of exact Principal Geodesic Analysis eliminating the need for the traditionally used 
linear approximations. 

The experimental section presents examples of manifold valued datasets where 
exact PGA improves linearized PGA, and we show how the differences between the 
methods are dependent on the curvature of the manifolds. The differences are signif- 
icant and clearly visually identifiable. 

We are currently in the process of extending the methods to work for quotient 
manifolds M/G and thereby allowing the computations to be performed on practi- 
cally all commonly occurring non-triangulated manifolds. We expect this would allow 
Geodesic PCA to be computed on general quotient manifolds as well. In addition, 
we are working on giving a theoretical treatment of the differences between the two 



formulations (2.8) of PGA. Finally, we expect to use the automatic computation of 
sectional curvatures to investigate further the effect of curvature on exact PGA and 
other statistical methods for manifold valued data. 

Acknowledgements. The authors would like to thank P. Thomas Fletcher for 
fruitful discussions on how to compute exact PGA and Nicolas Courty for important 
remarks on problems linked to data locality. Furthermore, we wish to thank S0ren 
Hauberg and Morten Engell-N0rregard for their help with producing data for testing 
the algorithms on high dimensional manifolds. 

Appendix A. Derivation of the Derivative ODEs. We will use tensors on 
R v and R m for the proofs of Theorem 13.21 and Theorem 13.41 and we will use the com- 
mon identification between tensors and multilinear maps, i.e. the tensor T : (R k ) r -> 

R defines a map multilinear map f : (M k ) r ~ 1 -> R k by (T(yi, . ■ ■ , y r -i), Vrj = 
T(y\, . . . , y r ). We will not distinguish between a tensor and its corresponding multi- 
linear map, and hence, in the above case, write T for both maps. 

For s-dependent vector fields v s ,i, ■ ■ ■ , v s ,r and tensor field T s , we will use the 
equality 

= (s T o) ( W 0,l, ■ • • j«0,r) + To(^U 0) l, • • • ,«0,r) H 1" T o{V0,l, ■ ■ ■ > £ v 0,r) 

for the derivative with respect to s. If T Xg is a composition of an z-dependent tensor 
field T z and an s-dependent curve x s , the derivative j^T Xg equals the covariant tensor 
derivative V d_ a .T Xg [3 Chap. 4]. Since we will only use tensors on Euclidean spaces, 
such tensor derivatives will consist of component-wise derivatives. 

In the following, we let be the z-dependent 3-tensor on W 1 defined by 

t z p ( Vi ,v 2 ,v 3 ) = -jr r&(*)«i«£«3 
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such that the kth component of T a ^(i t ,i t ) equals the right hand side of (2.1). Note 
that Tf is symmetric in the first two components since the Christoffel symbols are 
symmetric in i and j. Similarly, we let the ^-dependent 3-tensor T z ' p and 2-tensor 
T/ ,x equal the right hand side of the p and x parts of (2.2), respectively: 



T I ^(v u v 2 ) = -[Y J P k {z 1 v 1 )H z {F k ) 



V2 , 



T*'*(v) = (l-D z F^D z F)v 

We carry out the proof of Theorem I3.2l in two parts starting with the parametrized 
case. 

Proof. [Theorem I3.2j Let Xt, a be a family of geodesies with xt,o — x t, and define 
9s = %o,s and v s = xo jS . Assuming -^qo = u and j^vq = w, the Jacobi field Jt equals 
^E xp Qn ( top), and, therefore, we can obtain J t by differentiating the systems (2.1) 
and 



In the parametrized case, we get, using ( A.l ) and symmetry of T* 



d d 
dt 2 ds 



x t ,o = f s x t fi = a? T ^,o( i *,o» i *,o) 



(A.2) 



5^0,0 =tt, TtTsX ,o = w 



because Xt, s are solutions to (2.1 ) with initial conditions q s and v s . Therefore, setting 



Vi 



A A 

dt ds 



x t ,o and zt = ^%t,o, we get ( |3~2"| ) with 



f p ^ A/tV = ft? 'zt T x t (xt,Xt) + 2T*(y u x t )\ 

As noted above, the derivative V_d_ Xt Q T^ consists of just the component-wise deriva- 
tives of T z , i.e. the derivatives of the Christoffel symbols. 

For the implicit case, we use the map fi of section [2~4| to define the tensors 



Ti» = f,(z,v) , T?(vi,V2) = - J v 2 , 

lf(v) = (D Z F) v , and T?\v) = (D Z F)^ v. 



Note, in particular, that T^ p (v 1 ,v 2 ) = T**(T£(vi), v 2 ). We claim that £~Exp qo (tv Q ) 
equals the z-part of the solution of (3.3) with 



F 1 t 
q,v \ > 



Tlfipu z t ) + V 2t T£ (T£ ( Pt ), x t ) + T* (Tg t (y t ) - A(T£, V Zt T°) T Pu x t ) 



T'f(y t ) - A(T£, V Zt T»)T£(p t ) - T£ f V Zt Tj> t ) 



(A.3) 



Here p t — pt,o where pt tS are the p-parts of the solutions to ( 2.2 ) with initial conditions 
q s and v s . To justify the claim, we differentiate the system (2.2). Using (A.l ), we get 

d\£Pt<° = JsPtfi = Ji T x* (Pt,o,xt,o) 

= V^ t T*{T>i t (p t ),x t ) +T x H t (V^ t T£(p t ) + T£(£p t , ),i t ) 
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and 



d_ d 

dt 



Note that the tensor derivative V d „ Tl 1 consists of derivatives of H x ,(F k ). Both 
the derivatives V d „ T% and V a Tl' x involve derivatives of generalized inverses. 
Therefore, we apply Theorem 13. ll to differentiate T£ and get that 

V^ t , ^ = -A(T^,V^ T^f. 

The tensor derivative V_tL X( T^ t consists of derivatives of D Xt s F. Similarly, 



By differentiating the initial conditions, we get (3.3) with y = j^Pt,0: z = -^xtfi: and 
Fy V as defined in ( |A.3| ). □ 

For computing the second derivatives and proving Theorem I3.4( we will need to 
differentiate generalized inverses of matrices twice. For this task, we will use the 
lemma below, which follows directly from repeated application of the product rule for 
differentiation and Theorem 13. f I 

Lemma A.f. Let A t , s be s- and t-dependent matrices. If A t , s o,nd A\ s are 

differentiable with respect to both variables and the mixed partial derivative g^A tiS 
exists, then 



dsdt\ A t,. 



HA,s, gi^t,s, gi^-t,s, sjQi 



At,.) 



where 

A(A, B. C, D) 



-A(A, C)BA^ - A^DA^ - A BA(A, C) + Y(A, B, C, D) 

- (A(A, C)A + A+c) X(A, B)AA* - A^AY(A, B, C, D)AA* 

- A^AX(A, B) (CA* + AA(A, C)) 



X(A, B) = B T (A^) T A + A\rf) T B T , 

Y(A, B, C, D) = D T (A^) T A^ + B T U(A, C) t A^ + (A^) T A(A, C) 1 

+ (a(A, C)(A^) t + At(A(A, C)) t )b t + A^(A^ T D T 



We are then ready to prove Theorem 13.41 We will again start with the parame- 
terized case, and we will use the tensors introduced in the beginning of this section 
and the proof of Theorem 13.21 

Proof. [Theorem 13-4] We compute the q and r parts of G^ VoWU separately; denote 

respectively. Let {y™ s ,z]" s ) be solutions to (3.2) with 

IV's O,0)~ 



them d p ' q 

■ r and along the geodesies x t 



and let y™ and zf denote yj" and z 



respectively. Let also be solutions to (3.2) with IV's (it, 0) T along xt 



Differentiating system (3.2), we get 



t,0' 



dtds( Z Zo) - ds^Zo) - dsiVuo) 
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and, using symmetry of the tensors, 

ftfMfi) = s(Co) - ^V a j. G T^ (i t , 0> i t) o) + 2^I^ i0 (^o.it,o) 

= V 2? V 2 » T£ (i t , i t ) + Va <o Tl (x t ,x t ) + 2W zf T* (y« , i ( ) ( A.4) 

+ 2V Z? T£ (yf,x t ) + 2T£(|^ , i t ) + 2l£ (y» , tf) . 

Therefore, letting g t = £y% and r t = J^ t w , we get G^ oWU (t, (r t q t ) T ) as the right 
hand side of (A.4) and G^ w u (t, (r t qt) T ) equal to q t . The initial values are both 
since Uq s and Zq s equal and w, respectively, and, therefore, are not s-dependcnt. 

For the implicit case, we will again compute the r and q parts of G^ V[) w u sepa- 
rately. Let now (yf s) z™ s ) be solutions to (3.3) along the geodesies Xt, s and with IV's 
(w, 0) T , and let (y",z") be solutions to (3.3) along xt and with IV's (u, 0) T . Let also 
Pt, s denote the p-parts of the solutions to (2.2) with initial conditions q and v s , and 
write p t =Pt,o, V? = yf fi , and zf = zf fi . 

Differentiating system (3.3), we get 



d d „,w 
dt di 



+ S^ (T4 (^ )-A(I^ ifl ,V^ i0 ) T p t ,o,i t ,o). 
Using the map A defined in Lemma [A. 11 we have 

— A(T D V m T D ) T - A(T D V u,T D V ™T D V «V u,T D ) T 
Combining the equations, we get 

TtfA = V z ?Ti?( Ptl z?)+Ti?W,z:)+Ti?(p u i£z? fi ) 
+ V z uV zT T£(T£ t (p t ),x t ) + V jL T£(T£(p t ),x t ) 

ds z t,0 

+ V zf T* (T£ (yn - A(T x D t ,V zT T°) T p tl x t ) + V zf T* (T£ (p t ), 
+ V^T* (T£ (j/D - A(T£ V zr r£) T Pt , i t ) 
+ T£(T£(£ y ? t0 ) - A(T x D t ,V zr T x D t f y ?,x t ) 

- T* t (A(T£,V zr T x D t , V z? T x D t , V z uV z? T£) T p t + A(T£, V^ftf. ± t ) 
+ l£(T£ (yn ~ A(T£, V^fpt, z») . 

Substituting j^z™ with r t and j^y™ with g t , we get Gq'$ oWU as the right hand side 
of the equation. Likewise, 



dt 



ii z tfi = £ T x£>(y?,a) - Ts A ( T ? t , , v ^o T i?, ) T i?,o(pt.o) - js T F t l^ z^T Xt o ( Pt ,o) 
- A(r£, v zr r£, v.« r£, v z ?T? t )T? t (p t ) 

A(T£, V zr T£)V z uT£(p t ) - A(T x D t ,V zT T x D t )T x D t (y?) 

A(T£, V z? T°)V zf T°{p t ) - Tgv z uV zf T°{p t ) - T x f V, f l£(y?) . 

Again, after substituting ^y™o w ith q t as above, we get as the right hand 

side of the equation. As for the parametric case, both initial values are zero. 
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□ 



Appendix B. The Projection Gradient. We prove Theorem 14. 11 and, follow- 
ing this, we show how to compute the Hessian of the residual function R Xi)Jl . We will 



need the following result for the proof of Theorem 14. II to show that equation (4.3) is 
independent of the chosen basis. 

Lemma B.l. Let S be an open subset ofR k and U : S -> M fex(fc_1) a C 1 map 
with the property that for any d£S, the columns of the matrix U(v)) constitute 
an orthonormal basis for WL k . Let v? v denote the jth column of U(v). Then for any 
vq G S and m 6 ] 



U: S 



pfe-i 



(aj*4 +iJ*=o,uoy = -(ul ,w). As consequence of this, if 



denotes the map v i— > U(v) J 



vq 
II ' 'i j || 



the 



D 



- 1 for span (u J, 



1) U(V) = -Ik-! 
"do >■ 



in the basis , . . . 

In the proof below, we adopt the notation of section |4.2[ but we will use the 
alternative formulation R x ^(w) = HLog^Exp u;|| 2 for the residual function. 

Proof. [Theorem 14. 1| Extend the basis {v , . . . , v k , wq/||uo||} f° r Vv to an or- 
thonormal basis for T^M . The argument is not dependent on this choice of basis, 
but it will make the reasoning and notation easier. Let S C T^M x V 1 - be an open 
neighborhood of (w , v ) and define the map F v : S — > M. v by 



F v (w,v) 



V V]Rx,^l ' V 

W ■ 



\W ■ U 



(v)J 



Vv V W R X „ 



with the vectors u 1 (w), . . . , u v ^ l ' k+1 ^ (v) constituting an orthonormal basis for for 
each v and with (V v) and U v denoting the matrices having v*,v and u l {v) in the 
columns, respectively. Since (S7 Wo R x ^,v) — d WQ R x ^(v) = for all v <E V Vo because 
wq is a minimizer for R x ^ among vectors in in V Vo , we see that Fy (wo, Vq) vanishes. 
Therefore, if D™ Wo VQ ^Fv is non-singular, the implicit function theorem asserts the 
existence of a map ^ from a neighborhood of vq to T^M with the property that 
Fy(^>(v), v) = for all v in the neighborhood. We then compute 

= D Vo Fv($(v), v) = (Df WQ>Vo) F v ) (D VQ M>(v)) + (D v {w ^ Vg) F v ) 

and hence 



D 



vev,r, 



D (wo,v Q ) F V 



1 / «ey 

(w ,v ) v 



(B.l) 



For the differentials on the right hand side of (B.l), we have 



{w ,vo) v 
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,° ) Hu v ) 



and 

Df WO!Vo) F v = (( v «b) T d^(v«^A = ((h W0 {r.jJv v )) T \ (B 2) 

With the choice of basis, the above matrix is block triangular, 

D (w ,v ) I 'V -In C I , 

with j4 W0)t , equal to H Wo (R x ^\v^ )- The requirement that D™ wg VQ \Fv is non-singular 
is fulfilled, because H Wo (R x ,n\ v v ) has rank k + 1 by assumption and U Vo has rank 
r?-(fc + l). 

Since the first k rows of D, "° ,i<V are zero, we need only the last rj — k columns 



of {D7 Wo „ )-FV) _1 in order to compute (B.l |. The vector w K , p , Uo ,s„ as defined in the 
statement of the theorem is equal to the (k + l)st column. Let E X41VQ s vg be the 
matrix consisting of the remaining rj — (k + 1) columns. Using the form ( B.3 ) , we have 

r „ — ( —Hwo (Rv;,n\v vo ) B WQtVo C Wg V | 

Assume {u 1 , . . . , u J } is chosen such that {m 1 (wo), ■ ■ ■ , uJ ( w o)} equals the previously 
chosen basis for V^. With this assumption, C WOtVo is the identity matrix I v _^ + iy. 
In addition, let Wq +1 denote the (k + l)st component of wo, that is, the projection of 
Wq onto Vq/ II «o || • Since w Q € K, and by choice of U v , Lemma lB.ll gives 



Therefore, 

D ( Wo Z) F v={0 ••■0 V W0 *°R X ^ -w k+1 I, 



J r;-(fe+l) • 



T 

' "-^-(fc+l) 



Note, in particular, that D, v ) F v is independent on the actual choice of bases U v 



Combining the equations, we get 



Because Exp \I/(t;) = ns v (x), we get (4.3). 
□ 

Lets now compute second derivatives, and thereby the Hessian, of the residual 
function R x ^. Choose wq, v € T^M and let y = Exp^wo- Working in the orthonormal 
basis, we have 

V™ fl Il(1 = 2 ((DyLogJ D W0 Exp^) T Log x y . 

and hence 

3J (V w +vsRx,fi) \s=0 

= 2 (fs ( D E* Pl ,(w + S v)Log x ) | s=0 (A^Expj) Log x y 

T ( B - 4 ) 
+ 2 ((A,Log x ) ^ (D Wo+l , s ExpJ | s=0 ) Log^y 

+ 2 ((Z^LogJ (D^Exp^)) 7, £ (Log^Exp^Oo + sv)) | s=0 . 
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Note that 



d_ 

ds 



'Log a .Exp M (u; + sv)) | s=0 = (-D^LogJ (A^Exp^) v 



Using that j^(A s 1 ) = A s 1 (j^A s )A s 1 for a time depedent, invertible matrix ^l^and 
the fact that Exp^Log^z = z for all z, we get 



j- s (•C>E X p (l («. n + sl ,)Log ;) 



|s=0 



g x (Exp t«o+™) Ex Px ) |s=0 



= - (£>j,Log x ) £ (^Log^Exp^o + ^ExPz) |«=o (A/LogJ . 

The middle term of this product and the term ^ (iJ^+g^Exp^) | s= q in (B.4) are both 
computable using Theorem 13.41 
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